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FINAL  REPORT:  "Spatial  hearing,  attention  and  informational  masking 
in  speech  identification" 

The  following  report  deseribes  our  work  during  the  award  period  from  May  1,  2012,  through 
April  30,  2015.  The  progress  report  is  organized  aecording  to  the  objeetives  stated  in  the 
proposal  and  includes  short  narrative  summaries  with  representative  figures  and  a  list  of 
publications  and  presentations  supported  by  this  award.  A  more  complete  description  of  the 
findings  presented  here  in  abbreviated  form  can  be  found  in  the  referenced  publications  (pdfs 
available  upon  request)  or  in  more  extensive  written  summaries  of  unpublished  work  (also 
available  upon  request)  including  the  summaries  given  at  the  Sensory  Information  Systems 
program  reviews  in  October,  2013,  and  2014. 

I.  Collaboration 

Significant  portions  of  the  work  supported  by  this  award  have  been  conducted  through  the 
collaborative  efforts  of  the  research  groups  at  Boston  University  and  WPAFB.  These  two  groups 
and  associated  collaborators  routinely  hold  joint  discussions  of  research  at  the  annual  Acoustical 
Society  of  America  Spring  Meeting,  at  the  Midwinter  Research  Meeting  of  the  Association  for 
Research  in  Otolaryngology,  and  in  Boston  at  the  annual  Binaural  Bash  conference  sponsored  by 
the  Hearing  Research  Center  at  Boston  University.  The  specific  projects  upon  which  we  have 
collaborated  are  evident  in  the  co-authored  articles  and  presentations  at  scientific  conferences 
listed  in  a  following  section.  Both  of  the  broad  areas  of  work  proposed  here:  using  highly- 
quantized  representations  of  speech  signals  as  a  means  for  examining  informational  masking  and 
the  role  of  predictability  and  a  priori  knowledge  in  speech  stream  perception,  are  areas  that  have 
been/will  be  addressed  by  collaborative  work  between  these  two  groups  and  our  associated 
consultants  and  other  contributors.  These  projects  reflect  shared  interest  in  the  research  questions 
addressed  by  this  proposal  and  in  their  relevance  to  the  scientific  mission  of  AFOSR.  Our  plan 
going  forward  is  to  continue  these  collaborative  efforts  very  much  in  the  same  manner  as  in 
previous  years. 
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II.  Progress  Achieving  the  Objectives  of  the  Research  Plan 

The  work  proposed  in  this  applieation  continues  and  significantly  extends  the  work  undertaken 
during  the  previous  award  period.  Some  of  the  methods  we  propose  to  use  were  developed 
during  the  past  award  period  and  some  new  applications  emerged  that  guide  our  specific 
interests.  Most  importantly,  though,  the  work  that  was  completed  from  2012-2015  achieved  its 
intended  broad  goal:  our  understanding  of  the  auditory  and  cognitive  processes  underlying 
human  communication  in  multiple-talker  sound  fields  (i.e.,  the  "cocktail  party  problem")  has 
increased  dramatically  and  leads  directly  to  the  work  that  is  planned  and  described  in  the  current 
proposal.  The  following  section  reviews  our  progress  toward  the  objectives  described  in  the  last 
application. 


A.  To  obtain  a  better  understanding  of  the  mechanisms  of  auditory  masking  using 
"pointillistic"  processing  of  speech 

The  advantage  of  pointillistic  representations  of  speech  is  that  it  allows  for  precise  control  of  the 
information  that  is  available  to  the  listener  while  also  (potentially)  preserving  the  essential 
aspects  of  speech  for  intelligibility.  This  high  degree  of  control  can  be  used  to  determine  how 
well  specific  speech  features  are  maintained  under  masked  conditions  and  -  on  an  even  more 
basic  level  -  what  acoustic  information  is  needed  to  form  the  minimal  representations  of  these 
features.  Minimal/sparse  representations  could  be  used  to  reduce  the  amount  of  data  required  to 
code  speech  (potentially  improving  transmission  efficiency  for  specific  applications)  or  for 
enhancing  speech  in  ways  that  make  it  less  susceptible  to  masking  (e.g.,  emphasizing  the 
important  features  during  transmission). 

The  first  two  projects  discussed  in  this  section  provide  examples  and  summaries  of  our  work  in 
this  area.  The  first  project,  in  collaboration  with  Professor  Lori  Holt  of  Camegie-Mellon 
University  (consultant  on  grant  during  previous  award  period),  examined  the  idea  of  sparse 
representations  of  one  specific  phonemic  contrast  that  forms  the  basis  for  the  categorical 
perception  of  the  consonant-vowel  (CV)  sequence  /ba/-/da/-/ga/.  The  approach  was  to  take  an 
established  natural  sequence  of  CV  pairs  that  formed  an  acoustic  continuum  that,  in  this  case, 
was  distinguished  by  the  formant  transitions  (principally  the  second  formant)  -  the  principal 
feature  upon  which  the  /ba/-/da/-/ga/  categorical  judgment  is  based. 
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Figure  1  (left)  shows  human  performanee 
(proportion  of  eategories  ehosen)  in 
labelling  eonsonant- vowel  pairs  as  /ba/, 
/da/  or  /ga/  in  a  3-altemative  forced-choice 
procedure  (without  feedback)  as  a 
function  of  the  acoustic  variable  signaling 
phoneme  identity.  The  primary  physical 
variable  was  the  slope  of  the  second 
formant  transition  and  the  details  of  the 
methods  used  to  construct  this  stimulus 
continuum  are  given  in  Stephens  and  Holt 
(201 1).  The  categorical  nature  of  the 
pattern  of  results  is  seen  in  the  sharp 
change  from  nearly  100%  identification  of 
one  phoneme  (transitioning  to  near  0% 
identification  of  that  phoneme)  to  nearly 
100%  identification  of  the  next  phoneme 
along  the  continuum  -  despite  the  fact  that  each  step  along  the  abscissa  is  an  equal  change  in  the 
physical  variable.  The  category  boundaries  for  natural/unprocessed  speech  are  not  displayed 
because  they  are  nearly  identical  to  those  shown  for  80%  retention  of  the  points  representing  the 
phonemes  (upper  left  panel  of  Figure  1).  The  purpose  of  this  work  was  to  determine  how  sparse 
a  pointillistic  representation  could  be  and  still  accurately  convey  the  feature  underlying  phoneme 
identity.  This  figure  presents  a  portion  of  that  work,  and  indicates  that  reasonably  intact  category 
boundaries  are  possible  with  only  about  60%  of  the  points  remaining  after  applying  a  process 
that  randomly  removed  points  within  the  time-frequency  matrix  representing  the  stimulus.  A 
parallel  manipulation  -  removing  the  points  below  a  particular  amplitude  that  is  specified  by  the 
experimenters  -  indicated  that  the  category  boundaries  could  be  preserved  if  only  the  points 
falling  within  the  upper  20%  of  the  amplitude  range  were  retained  and  the  remainder  discarded. 
These  manipulations  support  the  hypothesis  that  pointillistic  speech  can  be  used  to  convey 
speech  features  accurately  with  sparse  representations.  Note  that  these  manipulations  used 
random  or  arbitrary  rules  to  remove  points.  We  currently  are  exploring  ways  of  minimizing  the 
pointillistic  representations  so  that  the  feature  of  interest  is  conveyed  with  the  minimum  possible 
number  of  points  guided  by  our  a  priori  knowledge  of  the  speech  feature  or  by  a  algorithm  that 
uses  the  results  of  perceptual  experiments  that  determine  the  weighting  of  points  as  they 
contribute  to  intelligibility  (i.e.,  which  points  are  most  important  for  distinguishing  /ba/  from 
/da/;  those  points  are  retained  while  others  not  important  for  determining  the  category  are 
discarded).  This  work  is  ongoing. 

A  second  project  was  completed  that  relied  on  pointillistic  speech  as  the  stimulus.  This  project 
was  not  originally  planned  under  the  objectives  stated  in  the  previous  proposal,  but  rather  grew 
out  of  a  "discovery"  that  could  have  practical  applications  of  interest  to  AFOSR.  This  work  was 
first  reported  at  the  annual  review  meeting  of  the  Sensory  Information  Systems  program  of 
AFOSR  at  Fort  Walton  Beach,  FL,  in  October,  2014  and  is  described  in  Kidd  and  Mason  (2015). 
The  basic  idea  from  an  acoustic  analysis  standpoint  is  shown  in  the  two  panels  of  Figure  2 
(below). 
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The  left  panel  shows  the  initial  stage  of  the  spectro-temporal  analysis  that  leads  to  pointillistic 
representations.  The  speeeh  signal  is  fdtered  into  16  eontiguous  ffequeney  bands.  The  Hilbert 
envelope  and  phase  (instantaneous  frequency)  functions  are  obtained  for  each  band  to  provide 
the  values  for  the  points  that  substitute  for  the  original  waveform.  Within  each  band  and  time 
segment,  the  averages  of  the  envelope  magnitude  and  instantaneous  frequency  are  computed  and 
are  used  to  generate  a  pure-tone  "point"  that  replaces  the  corresponding  segment  of  the  original 
waveform.  This  process  is  illustrated  in  the  right  panel  of  Figure  2.  The  lightly  shaded  section  of 
frequency  band  six  (low  to  high)  in  the  left  panel  is  expanded  and  replotted  in  the  right  panel. 
Here  the  envelope  (smoothed  black  curve  along  the  top  of  the  waveform)  and  waveform  fine 
structure  (light  gray)  of  the  original  signal  and  the  five  10-ms  duration  ramped  pure-tone  points 
(dark  waveforms)  that  replace  it  are  depicted.  This  analysis/synthesis  is  performed  on  all  time- 
frequency  units  of  the  speech  stimulus  yielding  a  complete  pointillistic  representation  that  past 
work  has  shown  to  be  highly  intelligible  (Kidd  et  ah,  2009)  although  it  lacks  strong  intonation 
and  some  other  features  of  natural-sounding  speech.  The  key  aspect  of  this  process  for  the 
current  discussion  is  the  (lack  of)  importance  of  the  starting  phases  of  the  individual  points. 
Speech  intelligibility  experiments  performed  in  our  laboratory  during  the  previous  award  period 
have  demonstrated  that  variations  in  the  starting  phases  of  the  points  -  unlike  scrambling  the 
phase  of  natural  speech  -  do  not  affect  intelligibility.  What  this  means  is  that  the  phases  then 
become  a  "free  parameter"  that  may  be  arbitrarily  -  or  deliberately  -  specified  without  degrading 
the  intelligibility  of  the  speech  signal  (which  will  be  called  "the  primary  message”).  Subsequent 
work  on  this  project  has  found  that  a  secondary  message  may  be  coded  by  the  phases  of  the 
individual  points;  for  example  a  starting  phase  of  0  radians  is  assigned  a  binary  value  of  0  and  a 
starting  phase  of  ti  radians  is  assigned  a  binary  1 .  If  those  values  may  be  recovered  by  signal 
processing  then  the  phases  may  be  converted  to  bits  forming  binary  words  (e.g.,  a  character 
specified  by  ASCII  code  or  a  word  in  a  table  look  up  in  a  predetermined  lexicon).  So  in  each 
time  segment  in  each  band  one  bit  may  be  coded  by  the  phase  of  the  pure-tone  point  without 
altering  the  intelligibility  of  the  primary  message.  A  secondary  message  may  then  be  covertly 
coded  in  the  bit  pattern  of  the  speech  signal.  We  have  pursued  this  line  of  work  with  a  goal  of 
determining  various  types  of  natural-pointillistic  hybrid  speech  that  retain  high  intelligibility, 
good  sound  quality,  and  (relatively)  high  rate  of  transmission  of  the  secondary  message  (in  the 
range  of  hundreds  of  words/sec). 
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Three  examples  are  shown  in  Figure  3  (right)  for 
different  types  of  hybrid  speeeh.  For  all  three 
examples,  near  100%  correct  intelligibility  and 
near  100%  correct  recovery  of  the  secondary 
message  (phase-coded  ASCII  characters)  were 
achieved.  Each  example  has  eight  higher-frequency 
pointillistic  bands  added  to  natural  speech.  The 
conditions  shown  are  (top  to  bottom):  natural 
speech  low-passed  at  4  kHz,  pointillistic  bands 
from  roughly  1.4-8  kHz;  broadband  natural  speech, 
pointillistic  bands  from  5-12  kHz;  and  broadband 
natural  speech,  pointillistic  bands  from  8-20  kHz. 

Work  on  this  project  continues. 

In  the  previous  two  projects,  pointillistic  speech 
was  presented  to  the  observer  in  isolation  rather 
than  in  masked  conditions.  In  the  next  project,  we 
employed  a  closely  related  approach  based  on 
"ideal  binary  masking"  or  "ideal  time-frequency 
analysis"  (ITFA)  that  has  been  used  to  separate  EM 
from  IM  in  speech  mixtures  (Brungart  et  ah,  2006). 

This  approach  is  closely  related  to  pointillistic 
processing  in  that  it  yields  a  highly-quantized 
analysis  of  speech  that  is  reduced  to  minimal  time- 
frequency  (T-F)  units.  The  ITFA  approach  yields 
speech  representations  that  can  be  used  as  stimuli 
in  speech  recognition  experiments  -  just  as  with 
pointillistic  speech  -  that  allow  a  high  degree  of 
control  of  the  stimulus.  Our  initial  work  in  this  area 
that  has  provided  us  with  the  background  and 
motivation  for  portions  of  the  work  in  the  current 
proposal  and  stems  in  part  from  the  following  joint 
project  between  the  group  at  BU  led  by  Dr. 

Virginia  Best  and  Ms.  Christine  Mason,  the 
research  group  at  WPAFB  led  by  Dr.  Nandini  Iyer,  and  the  group  at  Walter  Reed  Medical  Center 
led  by  Dr.  Douglas  Brungart.  The  goal  of  the  project  was  to  explore  the  EM-IM  contrast  for  two 
different  types  of  speech  materials  and  tests  that,  according  to  our  initial  hypothesis,  should 
differ  in  the  degree  to  which  spatial  separation  of  sources  provides  a  release  from  IM.  Portions  of 
this  work  are  described  in  a  manuscript  in  press  (Best  et  ah,  2015).  Furthermore,  the 
experimental  approach  sought  to  separate  acoustic  "better  ear"  cues  available  in  moments  where 
the  amplitude  of  the  masker  speech  was  low  while  the  target  speech  was  correspondingly 
relatively  high,  producing  brief  "glimpses"  of  the  target  from  perceptual  segregation  due  to 
internally-generated  binaural  processing. 
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For  each  type  of  speech  test  (the  "Modified  Rhyme  Test"  -  MRT  -  and  the  "Coordinate  Response 
Measure"  -  CRM)  the  key  comparison  was  between  speech  identification  performance  in  a 
condition  with  no  apparent  spatial  separation  of  sources  or  binaural  cues  (colocated  at  0° 
azimuth)  vs.  two  cases  where  binaural  processing  was  applied.  In  one  case,  called  '^binaural,”  the 
two  maskers  were  natural  speech  (unprocessed)  and  were  presented  spatialized  (via  head-related 
transfer  functions,  HRTFs)  at  ±60°  azimuth.  In  the  second  case  (called  "better-ear"),  the 
processing  described  in  detail  in  Brungart  and  Iyer  (2012)  was  applied  to  the  stimuli:  The  target 
was  always  presented  in  the  center  at  0°  azimuth.  Left  and  right  ear  masker  signals  were 
bandpass  filtered  by  a  128-channel  gammatone  filterbank  (80-5000  Hz).  The  time-domain 
outputs  from  each  channel  were  then  divided  into  20-ms  segments  (with  50%  overlap)  and 
multiplied  by  a  20-ms  raised-cosine  window.  Because  the  HRTFs  were  symmetric  across  the 
median  sagittal  plane,  the  target  signal  was  always  identical  in  the  two  ears.  Thus  it  was  possible 
to  identify  the  “better  ear”  as  the  ear  with  the  lower  total  root-mean-square  (rms)  energy  within 
each  windowed  T-F  segment.  For  every  time/frequency  segment,  the  better  ear  was  chosen  from 
the  binaural  stimulus,  the  better-ear  elements  (one  for  each  T-F  unit)  were  combined,  and  the 
resynthesized  mixture  was  presented  diotically. 

The  results  from  this  experiment  are  shown  in 
Figure  4  (right).  For  the  MRT  test  and 
materials,  which,  according  to  our 
interpretation  are  relatively  low  in  IM,  better- 
ear  and  binaural  conditions  yielded  about  the 
same  advantage  re.  colocated:  a  "benefit"  of 
approximately  5  dB.  For  the  CRM  test  and 
materials,  which  tend  to  produce  large 
amounts  of  IM,  the  advantage  for  better-ear 
was  about  6  dB,  while  the  advantage  for 
binaural  presentation  increased  to  about  9  dB. 


Thus,  the  conclusion  was  that  relying  on  fine¬ 
grained  T-F  glimpses  corresponding  to  moment-by-moment  acoustic  "better-ear"  advantages  can 
provide  a  significant  benefit  when  the  sound  sources  are  symmetrically  separated  in  azimuth. 
This  acoustically-determined  benefit  is  about  the  same  for  cases  where  EM  dominates 
performance.  However,  when  the  listening  situation  is  high  in  IM,  the  benefit  from  the  apparent 
separation  of  sources  that  is  achieved  through  binaural  listening  is  significantly  greater;  in  this 
case  the  difference  was  about  3  dB  greater  than  for  better-ear  alone. 

B.  To  examine  the  role  of  listener  expectation  in  stream  formation  and  maintenance 

The  premise  upon  which  this  objective  is  based  is  that  human  listeners  exploit  a  priori 
knowledge  about  source  and  message  probabilities  to  successfully  perform  sound  source 
segregation,  selection  and  speech  recognition  in  multiple-source  sound  fields.  To  examine  the 
merits  of  this  idea,  we  completed  two  major  projects  that  tested  aspects  of  this  putative  process. 
In  the  first  project,  which  was  a  collaborative  effort  among  the  group  at  BLF  led  by  Dr.  Kidd,  Dr. 
Gregory  Wakefield  of  the  University  of  Michigan  (consultant  on  grant  during  previous  award 
period),  and  Dr.  Eric  Thompson  of  Ball  Aerospace  &  Technologies  Corporation/WPAFB 
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(described  in  Kidd  et  al.,  2013),  the  sensitivity  of  listeners  to  statistical  dependencies  in 
sequences  of  nonspeech  sounds  was  systematically  examined.  The  basic  idea  here  is  that,  in 
order  for  listeners  to  exploit  a  priori  information  in  performing  stream  segregation  and 
maintenance,  we  must  first  determine  that  listeners  are,  indeed,  able  to  detect  the  presence  and 
judge  the  strength  of  statistical  dependencies  among  the  elements  of  sound  sequences.  To  test 
this  hypothesis  empirically,  sequences  of  nonspeech  sounds  were  presented  that  were  chosen 
from  transition  matrices  to  form  Markov  chains  (e.g.,  Rabiner,  1998).  The  underlying  perceptual 
dimensions  along  which  the  stimuli  varied  were  pitch  (pure-tone  frequency)  and  spatial  location 
(varied  according  to  interaural  time  differences  or  "ITDs").  Examples  of  sound  sequences  drawn 
from  six  states  from  either  of  two  signal  variables  (frequency/pitch  and  ITD/spatial  location)  and 
the  transition  matrices  from  which  they  were  constructed  are  shown  in  Figure  5  (below). 


Event  k  +  1 


Event  k  I  1 


The  left  panel  (table)  shows  two  example  transition  matrices  comprising  six  states  A-F  which 
correspond  to  six  values  of  pure-tone  frequency  or  ITD.  The  right  panel  is  an  illustration  of  the 
sequences  of  states  that  are  "draws"  from  the  two  matrices  and  that  would  be  used  to  present  the 
sounds  in  the  psychophysical  discrimination  experiment.  Here  the  stimulus  drawn  from  the  non- 
uniform  transition  matrix  (lower  right  panel)  has  fewer  transitions  than  the  stimulus  drawn  from 
the  uniform  (random)  transition  matrix,  so  the  correct  observer  response  would  be  to  choose  the 
stimulus  in  the  right-side  lower  panel. 
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The  results  from  the  psychophysieal  experiment  testing 
the  discriminability  of  sequential  dependencies  for  the 
two  perceptual  variables  -  pitch  and  location  -  are  shown 
in  Figure  6  (right).  This  figure  shows  percent  correct 
discrimination  for  sequences  of  sounds  (2-interval  2- 
altemative  forced-choice)  based  on  the  difference  in  the 
strength  of  the  sequential  dependency  generating  the 
sequences  (diagonal  entries  in  matrix).  Here  we  found 
that  listeners  were  quite  sensitive  to  the  statistical 
properties  of  sequences  of  sounds  when  the  "states"  in 
the  transition  matrices  were  frequency  or  apparent 
location  (consistently  better  performance  for  frequency) 
but  performed  notably  less  well  than  the  Ideal  Observer; 
that  monitoring  two  concurrent  streams  separated  by 
frequency  and  by  ear  of  presentation  was  possible  and 
resulted  in  performance  well  above  chance  (not  shown) 
but  was  poorer  than  comparable  performance  when  the 
streams  were  successively  presented.  Performance  was 
not  affected  by  the  two  durations  tested.  Overall,  this 
work  was  interpreted  as  supporting  the  underlying 
hypothesis  that  statistical  dependencies  can  be  exploited  by  listeners  in  forming  and  maintaining 
perceptual  streams.  Furthermore,  this  work  served  as  a  successful  feasibility  test  for  an  empirical 
approach  (using  Markov  chains  as  stimuli)  to  assess  perceptual  stream  formation  and 
maintenance  under  masked  conditions.  An  Ideal  Observer  approach  to  modeling  also  seems  to  be 
useful  in  helping  to  understand  human  limitations  on  performance. 


Transition  Probabiiity 


In  the  second  major  project  under  this  objective  (complete  description  in  Kidd  et  ah,  2014),  the 
goal  was  to  test  the  hypothesis  that  listener  expectation  could  be  beneficial  in  maintaining  the 
focus  of  attention  on  a  target  stream  of  speech  in  competition  with  other  masking  streams  of 
speech.  This  work  tested  the  hypothesis  -  similar  to  that  examined  in  the  preceding  work  above  - 
that  a  priori  knowledge  about  the  probability  of  occurrence  of  the  elements  comprising 
perceptual  streams  can  be  exploited  by  the  observer  to  segregate  and  selectively  attend  to  one 
specific  source  among  competing  sources.  However,  in  this  project  the  target  sequence  was 
intelligible  speech  as  were  the  competing  masking  sources.  The  a  priori  information  that  was  the 
primary  controlled  variable  was  the  conformance  to  a  known  and  valid  syntactic  structure:  in  the 
critical  comparison,  the  target  speech  was  syntactically  correct  vs.  syntactically  incorrect 
(random  words).  Two  other  main  variables  were  also  tested:  the  predictability  of  low-level 
segregation  cues  (reliable  spatial  location  of  target  source  [open  symbols]  or  target  talker  voice 
[filled  symbols])  and  the  interaction  between  syntactic  structure  and  IM  value  of  the  masker 
(speech  vs.  noise  control). 
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Principal  findings  from  this  study  are  summarized  in  Figure  7  (right).  These  are  group  mean 
performance-level  functions  (proportion  correct  as  a  function  of  target-to-masker  ratio)  for  a 
target  talker  and  two  independent  noise  maskers  (left  panel)  that  formed  the  high  EM  control  and 
for  two  independent  speech  maskers  that  created  large  amounts  of  IM.  First,  note  that  the 
functions  in  the  left  panel  have  steep  slopes  and  vary  little  across  the  different  parameters  tested 
(syntactically  correct  or  not;  target  voice  cue  vs.  target  location  cue).  In  fact,  the  differences  in 
"thresholds"  (midpoints  of  the  functions)  did  not  differ  significantly  across  conditions.  In 
contrast,  the  functions  in  the  right  panel  for  the  speech  maskers  were  much  shallower  consistent 
with  a  high  degree  of  IM  (Kidd  et  al.,  1998)  and  the  computed  thresholds  confirm  that 
conformance  to  a  correct  syntax  yielded  significantly  better  performance  than  for  incorrect  target 
sentence  syntax.  This  work  is  important  because  it  suggests  that  a  priori  knowledge  about  the 
syntactic  category  of  impending  words  in  a  sentence  assists  the  listener  in  segregating  and 
following  over  time  one  specific  stream  of  speech  mixed  with  unwanted  speech  sources. 
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attending  to  the  primary  message.  Under  many  conditions,  both  the  primary  and  secondary  messages 
were  fully  decipherable  through  listening  (primary  message)  and  signal  processing  (secondary  message). 
In  the  area  of  predictability  and  expectation,  the  reslience  of  streams  of  speech  to  perceptual  and  cognitive 
intrusions  by  competing  sounds  was  explored  on  a  linguistic  level  by  varying  syntactic  structure  and  by 
using  more  formal  means  for  varying  predictability  by  constructing  target  sequences  of  sounds  from 


transistion  matrices  (i.e.,  Markov  chains).  The  main  finding  was  that  listeners  could  extract  a  target  stream 
more  successfully  when  it  contained  sequences  of  elements  that  were  statistically  predictable  as  specified 
by  the  transition  matrix.  This  approach  was  judged  as  successful  in  improving  our  understanding  of  the 
human  processing  of  streams  of  speech  masked  by  competing  streams  of  speech. 
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